{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "title"
   },
   "source": [
    "# Vertex SDK: Custom Training Tabular Regression Models for Online Prediction and Explainability"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "overview:custom,xai"
   },
   "source": [
    "## Overview\n",
    "\n",
    "\n",
    "This notebook demonstrates how to use the Vertex SDK to train and deploy a custom tabular regression model for online prediction with explanation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "dataset:custom,boston,lrg"
   },
   "source": [
    "### Dataset\n",
    "\n",
    "The dataset used for this notebook is the [Boston Housing Prices dataset](https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html). The version of the dataset you will use in this notebook is built into TensorFlow. The trained model predicts the median price of a house in units of 1K USD."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "objective:custom,training,online_prediction,xai"
   },
   "source": [
    "### Learning objectives\n",
    "\n",
    "In this notebook, you create a custom model from a Python script in a Google prebuilt Docker container using the Vertex SDK, and then do a prediction with explanations on the deployed model by sending data. You can alternatively create custom models using `gcloud` command-line tool or online using Cloud Console.\n",
    "\n",
    "The steps performed include:\n",
    "\n",
    "- Create a Vertex custom job for training a model.\n",
    "- Train a TensorFlow model.\n",
    "- Retrieve and load the model artifacts.\n",
    "- View the model evaluation.\n",
    "- Set explanation parameters.\n",
    "- Upload the model as a Vertex `Model` resource.\n",
    "- Deploy the `Model` resource to a serving `Endpoint` resource.\n",
    "- Make a prediction with explanation.\n",
    "- Undeploy the `Model` resource.\n",
    "\n",
    "Each learning objective will correspond to a __#TODO__ in this student lab notebook -- try to complete this notebook first and then review the [solution notebook](../solutions/sdk_custom_tabular_regression_online_explain.ipynb)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "install_aip:mbsdk"
   },
   "source": [
    "## Installation\n",
    "\n",
    "Install the latest version of Vertex SDK for Python."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 718
    },
    "id": "2bDZNtmyABMH",
    "outputId": "4466beee-031d-467e-b4cb-b13b23aee768"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: google-cloud-aiplatform in /opt/conda/lib/python3.7/site-packages (1.10.0)\n",
      "Collecting google-cloud-storage<2.0.0dev,>=1.32.0\n",
      "  Downloading google_cloud_storage-1.44.0-py2.py3-none-any.whl (106 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m106.8/106.8 KB\u001b[0m \u001b[31m6.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: google-cloud-bigquery<3.0.0dev,>=1.15.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-aiplatform) (2.34.0)\n",
      "Requirement already satisfied: packaging>=14.3 in /opt/conda/lib/python3.7/site-packages (from google-cloud-aiplatform) (21.3)\n",
      "Requirement already satisfied: proto-plus>=1.10.1 in /opt/conda/lib/python3.7/site-packages (from google-cloud-aiplatform) (1.20.3)\n",
      "Requirement already satisfied: google-api-core[grpc]<3.0.0dev,>=1.26.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-aiplatform) (2.5.0)\n",
      "Requirement already satisfied: google-auth<3.0dev,>=1.25.0 in /opt/conda/lib/python3.7/site-packages (from google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (2.6.0)\n",
      "Requirement already satisfied: protobuf>=3.12.0 in /opt/conda/lib/python3.7/site-packages (from google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (3.19.4)\n",
      "Requirement already satisfied: requests<3.0.0dev,>=2.18.0 in /opt/conda/lib/python3.7/site-packages (from google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (2.27.1)\n",
      "Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.52.0 in /opt/conda/lib/python3.7/site-packages (from google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (1.54.0)\n",
      "Requirement already satisfied: grpcio<2.0dev,>=1.33.2 in /opt/conda/lib/python3.7/site-packages (from google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (1.44.0)\n",
      "Requirement already satisfied: grpcio-status<2.0dev,>=1.33.2 in /opt/conda/lib/python3.7/site-packages (from google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (1.44.0)\n",
      "Requirement already satisfied: google-resumable-media<3.0dev,>=0.6.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-bigquery<3.0.0dev,>=1.15.0->google-cloud-aiplatform) (2.1.0)\n",
      "Requirement already satisfied: google-cloud-core<3.0.0dev,>=1.4.1 in /opt/conda/lib/python3.7/site-packages (from google-cloud-bigquery<3.0.0dev,>=1.15.0->google-cloud-aiplatform) (2.2.2)\n",
      "Requirement already satisfied: python-dateutil<3.0dev,>=2.7.2 in /opt/conda/lib/python3.7/site-packages (from google-cloud-bigquery<3.0.0dev,>=1.15.0->google-cloud-aiplatform) (2.8.2)\n",
      "Requirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from google-cloud-storage<2.0.0dev,>=1.32.0->google-cloud-aiplatform) (1.16.0)\n",
      "Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.7/site-packages (from packaging>=14.3->google-cloud-aiplatform) (3.0.7)\n",
      "Requirement already satisfied: cachetools<6.0,>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from google-auth<3.0dev,>=1.25.0->google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (5.0.0)\n",
      "Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/conda/lib/python3.7/site-packages (from google-auth<3.0dev,>=1.25.0->google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (0.2.7)\n",
      "Requirement already satisfied: rsa<5,>=3.1.4 in /opt/conda/lib/python3.7/site-packages (from google-auth<3.0dev,>=1.25.0->google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (4.8)\n",
      "Requirement already satisfied: google-crc32c<2.0dev,>=1.0 in /opt/conda/lib/python3.7/site-packages (from google-resumable-media<3.0dev,>=0.6.0->google-cloud-bigquery<3.0.0dev,>=1.15.0->google-cloud-aiplatform) (1.1.2)\n",
      "Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (3.3)\n",
      "Requirement already satisfied: charset-normalizer~=2.0.0 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (2.0.12)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (2021.10.8)\n",
      "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (1.26.8)\n",
      "Requirement already satisfied: cffi>=1.0.0 in /opt/conda/lib/python3.7/site-packages (from google-crc32c<2.0dev,>=1.0->google-resumable-media<3.0dev,>=0.6.0->google-cloud-bigquery<3.0.0dev,>=1.15.0->google-cloud-aiplatform) (1.15.0)\n",
      "Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /opt/conda/lib/python3.7/site-packages (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=1.25.0->google-api-core[grpc]<3.0.0dev,>=1.26.0->google-cloud-aiplatform) (0.4.8)\n",
      "Requirement already satisfied: pycparser in /opt/conda/lib/python3.7/site-packages (from cffi>=1.0.0->google-crc32c<2.0dev,>=1.0->google-resumable-media<3.0dev,>=0.6.0->google-cloud-bigquery<3.0.0dev,>=1.15.0->google-cloud-aiplatform) (2.21)\n",
      "Installing collected packages: google-cloud-storage\n",
      "Successfully installed google-cloud-storage-1.44.0\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "\n",
    "# Google Cloud Notebook\n",
    "if os.path.exists(\"/opt/deeplearning/metadata/env_version\"):\n",
    "    USER_FLAG = \"--user\"\n",
    "else:\n",
    "    USER_FLAG = \"\"\n",
    "\n",
    "! pip3 install --upgrade google-cloud-aiplatform $USER_FLAG"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "install_storage"
   },
   "source": [
    "Install the latest GA version of *google-cloud-storage* library as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "WrguqLRNABMJ",
    "outputId": "61eca76b-b797-4d31-db7d-9313bb605ee5"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: google-cloud-storage in /home/jupyter/.local/lib/python3.7/site-packages (1.44.0)\n",
      "Collecting google-cloud-storage\n",
      "  Downloading google_cloud_storage-2.1.0-py2.py3-none-any.whl (106 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m106.6/106.6 KB\u001b[0m \u001b[31m5.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: google-cloud-core<3.0dev,>=1.6.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-storage) (2.2.2)\n",
      "Requirement already satisfied: requests<3.0.0dev,>=2.18.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-storage) (2.27.1)\n",
      "Requirement already satisfied: google-auth<3.0dev,>=1.25.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-storage) (2.6.0)\n",
      "Requirement already satisfied: google-api-core<3.0dev,>=1.29.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-storage) (2.5.0)\n",
      "Requirement already satisfied: google-resumable-media>=1.3.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-storage) (2.1.0)\n",
      "Requirement already satisfied: protobuf in /opt/conda/lib/python3.7/site-packages (from google-cloud-storage) (3.19.4)\n",
      "Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.52.0 in /opt/conda/lib/python3.7/site-packages (from google-api-core<3.0dev,>=1.29.0->google-cloud-storage) (1.54.0)\n",
      "Requirement already satisfied: rsa<5,>=3.1.4 in /opt/conda/lib/python3.7/site-packages (from google-auth<3.0dev,>=1.25.0->google-cloud-storage) (4.8)\n",
      "Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/conda/lib/python3.7/site-packages (from google-auth<3.0dev,>=1.25.0->google-cloud-storage) (0.2.7)\n",
      "Requirement already satisfied: six>=1.9.0 in /opt/conda/lib/python3.7/site-packages (from google-auth<3.0dev,>=1.25.0->google-cloud-storage) (1.16.0)\n",
      "Requirement already satisfied: cachetools<6.0,>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from google-auth<3.0dev,>=1.25.0->google-cloud-storage) (5.0.0)\n",
      "Requirement already satisfied: google-crc32c<2.0dev,>=1.0 in /opt/conda/lib/python3.7/site-packages (from google-resumable-media>=1.3.0->google-cloud-storage) (1.1.2)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-cloud-storage) (2021.10.8)\n",
      "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-cloud-storage) (1.26.8)\n",
      "Requirement already satisfied: charset-normalizer~=2.0.0 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-cloud-storage) (2.0.12)\n",
      "Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-cloud-storage) (3.3)\n",
      "Requirement already satisfied: cffi>=1.0.0 in /opt/conda/lib/python3.7/site-packages (from google-crc32c<2.0dev,>=1.0->google-resumable-media>=1.3.0->google-cloud-storage) (1.15.0)\n",
      "Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /opt/conda/lib/python3.7/site-packages (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=1.25.0->google-cloud-storage) (0.4.8)\n",
      "Requirement already satisfied: pycparser in /opt/conda/lib/python3.7/site-packages (from cffi>=1.0.0->google-crc32c<2.0dev,>=1.0->google-resumable-media>=1.3.0->google-cloud-storage) (2.21)\n",
      "Installing collected packages: google-cloud-storage\n",
      "  Attempting uninstall: google-cloud-storage\n",
      "    Found existing installation: google-cloud-storage 1.44.0\n",
      "    Uninstalling google-cloud-storage-1.44.0:\n",
      "      Successfully uninstalled google-cloud-storage-1.44.0\n",
      "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
      "google-cloud-aiplatform 1.10.0 requires google-cloud-storage<2.0.0dev,>=1.32.0, but you have google-cloud-storage 2.1.0 which is incompatible.\u001b[0m\u001b[31m\n",
      "\u001b[0mSuccessfully installed google-cloud-storage-2.1.0\n"
     ]
    }
   ],
   "source": [
    "! pip3 install -U google-cloud-storage $USER_FLAG"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note**: Please ignore any incompatibility warnings and errors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "id": "install_tensorflow"
   },
   "outputs": [],
   "source": [
    "if os.getenv(\"IS_TESTING\"):\n",
    "    ! pip3 install --upgrade tensorflow $USER_FLAG"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "restart"
   },
   "source": [
    "### Restart the kernel\n",
    "\n",
    "Once you've installed the additional packages, you need to restart the notebook kernel so it can find the packages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "id": "6IaM2KsfABMM"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "if not os.getenv(\"IS_TESTING\"):\n",
    "    # Automatically restart kernel after installs\n",
    "    import IPython\n",
    "\n",
    "    app = IPython.Application.instance()\n",
    "    app.kernel.do_shutdown(True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set your project ID\n",
    "__If you don't know your project ID__, you may be able to get your project ID using gcloud."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "set_project_id"
   },
   "outputs": [],
   "source": [
    "PROJECT_ID = \"[your-project-id]\"  # @param {type:\"string\"}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "autoset_project_id"
   },
   "outputs": [],
   "source": [
    "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n",
    "    # Get your GCP project id from gcloud\n",
    "    shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null\n",
    "    PROJECT_ID = shell_output[0]\n",
    "    print(\"Project ID:\", PROJECT_ID)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "set_gcloud_project_id",
    "outputId": "15af5c64-2524-4ed5-b15d-b4b22e5881e0"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Updated property [core/project].\n"
     ]
    }
   ],
   "source": [
    "! gcloud config set project $PROJECT_ID"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "region"
   },
   "source": [
    "#### Region\n",
    "\n",
    "You can also change the `REGION` variable, which is used for operations\n",
    "throughout the rest of this notebook.  Below are regions supported for Vertex AI. It is recommended to choose the region closest to you.\n",
    "\n",
    "- Americas: `us-central1`\n",
    "- Europe: `europe-west4`\n",
    "- Asia Pacific: `asia-east1`\n",
    "\n",
    "You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.\n",
    "\n",
    "Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "id": "wnurEkWYABMO"
   },
   "outputs": [],
   "source": [
    "REGION = \"us-central1\"  # @param {type: \"string\"}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "timestamp"
   },
   "source": [
    "#### Timestamp\n",
    "\n",
    "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a timestamp for each instance session, and append the timestamp onto the name of resources you create in this tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "id": "HpnbCFuwABMO"
   },
   "outputs": [],
   "source": [
    "from datetime import datetime\n",
    "\n",
    "TIMESTAMP = datetime.now().strftime(\"%Y%m%d%H%M%S\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "bucket:mbsdk"
   },
   "source": [
    "### Create a Cloud Storage bucket\n",
    "\n",
    "**The following steps are required, regardless of your notebook environment.**\n",
    "\n",
    "When you initialize the Vertex SDK for Python, you specify a Cloud Storage staging bucket. The staging bucket is where all the data associated with your dataset and model resources are retained across sessions.\n",
    "\n",
    "Set the name of your Cloud Storage bucket below. Bucket names must be globally unique across all Google Cloud projects, including those outsides of your organization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "id": "bucket"
   },
   "outputs": [],
   "source": [
    "BUCKET_NAME = \"gs://[your-project-id]\"  # @param {type:\"string\"}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "id": "autoset_bucket"
   },
   "outputs": [],
   "source": [
    "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"gs://[your-bucket-name]\":\n",
    "    BUCKET_NAME = \"gs://\" + PROJECT_ID + \"aip-\" + TIMESTAMP"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "create_bucket"
   },
   "source": [
    "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "YAgYCCxdABMQ",
    "outputId": "6a23e5ab-57e1-4a8f-d26d-ce0029cc5b1e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Creating gs://qwiklabs-gcp-00-e8a3bcf2cce6/...\n"
     ]
    }
   ],
   "source": [
    "! gsutil mb -l $REGION $BUCKET_NAME"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "validate_bucket"
   },
   "source": [
    "Finally, validate access to your Cloud Storage bucket by examining its contents:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "id": "k3PVPN3DABMR"
   },
   "outputs": [],
   "source": [
    "! gsutil ls -al $BUCKET_NAME"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "setup_vars"
   },
   "source": [
    "### Set up variables\n",
    "\n",
    "Next, set up some variables used throughout the notebook.\n",
    "### Import libraries and define constants"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "id": "import_aip:mbsdk",
    "tags": []
   },
   "outputs": [],
   "source": [
    "import google.cloud.aiplatform as aip"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "init_aip:mbsdk"
   },
   "source": [
    "## Initialize Vertex SDK for Python\n",
    "\n",
    "Initialize the Vertex SDK for Python for your project and corresponding bucket."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "id": "xdTdzXHSABMS"
   },
   "outputs": [],
   "source": [
    "aip.init(project=PROJECT_ID, staging_bucket=BUCKET_NAME)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "accelerators:training,cpu,prediction,cpu,mbsdk"
   },
   "source": [
    "#### Set hardware accelerators\n",
    "\n",
    "You can set hardware accelerators for training and prediction.\n",
    "\n",
    "Set the variables `TRAIN_GPU/TRAIN_NGPU` and `DEPLOY_GPU/DEPLOY_NGPU` to use a container image supporting a GPU and the number of GPUs allocated to the virtual machine (VM) instance. For example, to use a GPU container image with 4 Nvidia Telsa K80 GPUs allocated to each VM, you would specify:\n",
    "\n",
    "    (aip.AcceleratorType.NVIDIA_TESLA_K80, 4)\n",
    "\n",
    "\n",
    "Otherwise specify `(None, None)` to use a container image to run on a CPU.\n",
    "\n",
    "Learn more [here](https://cloud.google.com/vertex-ai/docs/general/locations#accelerators) hardware accelerator support for your region\n",
    "\n",
    "*Note*: TF releases before 2.3 for GPU support will fail to load the custom model in this notebook. It is a known issue and fixed in TF 2.3 -- which is caused by static graph ops that are generated in the serving function. If you encounter this issue on your own custom models, use a container image for TF 2.3 with GPU support."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "id": "QgYCQJGKABMT"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import sys\n",
    "\n",
    "if os.getenv(\"IS_TESTING_TRAIN_GPU\"):\n",
    "    TRAIN_GPU, TRAIN_NGPU = (\n",
    "        aip.gapic.AcceleratorType.NVIDIA_TESLA_K80,\n",
    "        int(os.getenv(\"IS_TESTING_TRAIN_GPU\")),\n",
    "    )\n",
    "else:\n",
    "    TRAIN_GPU, TRAIN_NGPU = (None, None)\n",
    "\n",
    "if os.getenv(\"IS_TESTING_DEPLOY_GPU\"):\n",
    "    DEPLOY_GPU, DEPLOY_NGPU = (\n",
    "        aip.gapic.AcceleratorType.NVIDIA_TESLA_K80,\n",
    "        int(os.getenv(\"IS_TESTING_DEPLOY_GPU\")),\n",
    "    )\n",
    "else:\n",
    "    DEPLOY_GPU, DEPLOY_NGPU = (None, None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "container:training,prediction"
   },
   "source": [
    "#### Set pre-built containers\n",
    "\n",
    "Set the pre-built Docker container image for training and prediction.\n",
    "\n",
    "\n",
    "For the latest list, see [Pre-built containers for training](https://cloud.google.com/ai-platform-unified/docs/training/pre-built-containers).\n",
    "\n",
    "\n",
    "For the latest list, see [Pre-built containers for prediction](https://cloud.google.com/ai-platform-unified/docs/predictions/pre-built-containers)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "G3cX6vikABMT",
    "outputId": "23a0cd03-1e73-44e0-a8ea-afd32b527b77"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training: gcr.io/cloud-aiplatform/training/tf-cpu.2-1:latest None None\n",
      "Deployment: gcr.io/cloud-aiplatform/prediction/tf2-cpu.2-1:latest None None\n"
     ]
    }
   ],
   "source": [
    "if os.getenv(\"IS_TESTING_TF\"):\n",
    "    TF = os.getenv(\"IS_TESTING_TF\")\n",
    "else:\n",
    "    TF = \"2-1\"\n",
    "\n",
    "if TF[0] == \"2\":\n",
    "    if TRAIN_GPU:\n",
    "        TRAIN_VERSION = \"tf-gpu.{}\".format(TF)\n",
    "    else:\n",
    "        TRAIN_VERSION = \"tf-cpu.{}\".format(TF)\n",
    "    if DEPLOY_GPU:\n",
    "        DEPLOY_VERSION = \"tf2-gpu.{}\".format(TF)\n",
    "    else:\n",
    "        DEPLOY_VERSION = \"tf2-cpu.{}\".format(TF)\n",
    "else:\n",
    "    if TRAIN_GPU:\n",
    "        TRAIN_VERSION = \"tf-gpu.{}\".format(TF)\n",
    "    else:\n",
    "        TRAIN_VERSION = \"tf-cpu.{}\".format(TF)\n",
    "    if DEPLOY_GPU:\n",
    "        DEPLOY_VERSION = \"tf-gpu.{}\".format(TF)\n",
    "    else:\n",
    "        DEPLOY_VERSION = \"tf-cpu.{}\".format(TF)\n",
    "\n",
    "TRAIN_IMAGE = \"gcr.io/cloud-aiplatform/training/{}:latest\".format(TRAIN_VERSION)\n",
    "DEPLOY_IMAGE = \"gcr.io/cloud-aiplatform/prediction/{}:latest\".format(DEPLOY_VERSION)\n",
    "\n",
    "print(\"Training:\", TRAIN_IMAGE, TRAIN_GPU, TRAIN_NGPU)\n",
    "print(\"Deployment:\", DEPLOY_IMAGE, DEPLOY_GPU, DEPLOY_NGPU)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "machine:training,prediction"
   },
   "source": [
    "#### Set machine type\n",
    "\n",
    "Next, set the machine type to use for training and prediction.\n",
    "\n",
    "- Set the variables `TRAIN_COMPUTE` and `DEPLOY_COMPUTE` to configure  the compute resources for the VMs you will use for training and prediction.\n",
    " - `machine type`\n",
    "     - `n1-standard`: 3.75GB of memory per vCPU.\n",
    "     - `n1-highmem`: 6.5GB of memory per vCPU\n",
    "     - `n1-highcpu`: 0.9 GB of memory per vCPU\n",
    " - `vCPUs`: number of \\[2, 4, 8, 16, 32, 64, 96 \\]\n",
    "\n",
    "*Note: The following is not supported for training:*\n",
    "\n",
    " - `standard`: 2 vCPUs\n",
    " - `highcpu`: 2, 4 and 8 vCPUs\n",
    "\n",
    "*Note: You may also use n2 and e2 machine types for training and deployment, but they do not support GPUs*."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "-jSjJUrKABMU",
    "outputId": "0a9b6526-c23a-44a6-9d3e-5789e5afb454"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train machine type n1-standard-4\n",
      "Deploy machine type n1-standard-4\n"
     ]
    }
   ],
   "source": [
    "if os.getenv(\"IS_TESTING_TRAIN_MACHINE\"):\n",
    "    MACHINE_TYPE = os.getenv(\"IS_TESTING_TRAIN_MACHINE\")\n",
    "else:\n",
    "    MACHINE_TYPE = \"n1-standard\"\n",
    "\n",
    "VCPU = \"4\"\n",
    "TRAIN_COMPUTE = MACHINE_TYPE + \"-\" + VCPU\n",
    "print(\"Train machine type\", TRAIN_COMPUTE)\n",
    "\n",
    "if os.getenv(\"IS_TESTING_DEPLOY_MACHINE\"):\n",
    "    MACHINE_TYPE = os.getenv(\"IS_TESTING_DEPLOY_MACHINE\")\n",
    "else:\n",
    "    MACHINE_TYPE = \"n1-standard\"\n",
    "\n",
    "VCPU = \"4\"\n",
    "DEPLOY_COMPUTE = MACHINE_TYPE + \"-\" + VCPU\n",
    "print(\"Deploy machine type\", DEPLOY_COMPUTE)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "tutorial_start:custom"
   },
   "source": [
    "## Tutorial\n",
    "\n",
    "Now you are ready to start creating your own custom model and training for Boston Housing."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "examine_training_package"
   },
   "source": [
    "### Examine the training package\n",
    "\n",
    "#### Package layout\n",
    "\n",
    "Before you start the training, you will look at how a Python package is assembled for a custom training job. When unarchived, the package contains the following directory/file layout.\n",
    "\n",
    "- PKG-INFO\n",
    "- README.md\n",
    "- setup.cfg\n",
    "- setup.py\n",
    "- trainer\n",
    "  - \\_\\_init\\_\\_.py\n",
    "  - task.py\n",
    "\n",
    "The files `setup.cfg` and `setup.py` are the instructions for installing the package into the operating environment of the Docker image.\n",
    "\n",
    "The file `trainer/task.py` is the Python script for executing the custom training job. *Note*, when referred to the worker pool specification, replace the directory slash with a dot (`trainer.task`) and dropped the file suffix (`.py`).\n",
    "\n",
    "#### Package Assembly\n",
    "\n",
    "In the following cells, you will assemble the training package."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "id": "cy1uCd6-ABMU"
   },
   "outputs": [],
   "source": [
    "# Make folder for Python training script\n",
    "! rm -rf custom\n",
    "! mkdir custom\n",
    "\n",
    "# Add package information\n",
    "! touch custom/README.md\n",
    "\n",
    "setup_cfg = \"[egg_info]\\n\\ntag_build =\\n\\ntag_date = 0\"\n",
    "! echo \"$setup_cfg\" > custom/setup.cfg\n",
    "\n",
    "setup_py = \"import setuptools\\n\\nsetuptools.setup(\\n\\n    install_requires=[\\n\\n        'tensorflow_datasets==1.3.0',\\n\\n    ],\\n\\n    packages=setuptools.find_packages())\"\n",
    "! echo \"$setup_py\" > custom/setup.py\n",
    "\n",
    "pkg_info = \"Metadata-Version: 1.0\\n\\nName: Boston Housing tabular regression\\n\\nVersion: 0.0.0\\n\\nSummary: Demostration training script\\n\\nHome-page: www.google.com\\n\\nAuthor: Google\\n\\nAuthor-email: aferlitsch@google.com\\n\\nLicense: Public\\n\\nDescription: Demo\\n\\nPlatform: Vertex\"\n",
    "! echo \"$pkg_info\" > custom/PKG-INFO\n",
    "\n",
    "# Make the training subfolder\n",
    "! mkdir custom/trainer\n",
    "! touch custom/trainer/__init__.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "taskpy_contents:boston"
   },
   "source": [
    "#### Task.py contents\n",
    "\n",
    "In the next cell, you write the contents of the training script task.py. I won't go into detail, it's just there for you to browse. In summary:\n",
    "\n",
    "- Get the directory where to save the model artifacts from the command line (`--model_dir`), and if not specified, then from the environment variable `AIP_MODEL_DIR`.\n",
    "- Loads Boston Housing dataset from TF.Keras builtin datasets\n",
    "- Builds a simple deep neural network model using TF.Keras model API.\n",
    "- Compiles the model (`compile()`).\n",
    "- Sets a training distribution strategy according to the argument `args.distribute`.\n",
    "- Trains the model (`fit()`) with epochs specified by `args.epochs`.\n",
    "- Saves the trained model (`save(args.model_dir)`) to the specified model directory.\n",
    "- Saves the maximum value for each feature `f.write(str(params))` to the specified parameters file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "XF7mjc4tABMV",
    "outputId": "e5c10618-7f9c-40e5-970f-3a804b846e11"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Writing custom/trainer/task.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile custom/trainer/task.py\n",
    "# Single, Mirror and Multi-Machine Distributed Training for Boston Housing\n",
    "\n",
    "import tensorflow_datasets as tfds\n",
    "import tensorflow as tf\n",
    "from tensorflow.python.client import device_lib\n",
    "import numpy as np\n",
    "import argparse\n",
    "import os\n",
    "import sys\n",
    "tfds.disable_progress_bar()\n",
    "\n",
    "parser = argparse.ArgumentParser()\n",
    "parser.add_argument('--model-dir', dest='model_dir',\n",
    "                    default=os.getenv('AIP_MODEL_DIR'), type=str, help='Model dir.')\n",
    "parser.add_argument('--lr', dest='lr',\n",
    "                    default=0.001, type=float,\n",
    "                    help='Learning rate.')\n",
    "parser.add_argument('--epochs', dest='epochs',\n",
    "                    default=20, type=int,\n",
    "                    help='Number of epochs.')\n",
    "parser.add_argument('--steps', dest='steps',\n",
    "                    default=100, type=int,\n",
    "                    help='Number of steps per epoch.')\n",
    "parser.add_argument('--distribute', dest='distribute', type=str, default='single',\n",
    "                    help='distributed training strategy')\n",
    "parser.add_argument('--param-file', dest='param_file',\n",
    "                    default='/tmp/param.txt', type=str,\n",
    "                    help='Output file for parameters')\n",
    "args = parser.parse_args()\n",
    "\n",
    "print('Python Version = {}'.format(sys.version))\n",
    "print('TensorFlow Version = {}'.format(tf.__version__))\n",
    "print('TF_CONFIG = {}'.format(os.environ.get('TF_CONFIG', 'Not found')))\n",
    "\n",
    "# Single Machine, single compute device\n",
    "if args.distribute == 'single':\n",
    "    if tf.test.is_gpu_available():\n",
    "        strategy = tf.distribute.OneDeviceStrategy(device=\"/gpu:0\")\n",
    "    else:\n",
    "        strategy = tf.distribute.OneDeviceStrategy(device=\"/cpu:0\")\n",
    "# Single Machine, multiple compute device\n",
    "elif args.distribute == 'mirror':\n",
    "    strategy = tf.distribute.MirroredStrategy()\n",
    "# Multiple Machine, multiple compute device\n",
    "elif args.distribute == 'multi':\n",
    "    strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()\n",
    "\n",
    "# Multi-worker configuration\n",
    "print('num_replicas_in_sync = {}'.format(strategy.num_replicas_in_sync))\n",
    "\n",
    "\n",
    "def make_dataset():\n",
    "\n",
    "  # Scaling Boston Housing data features\n",
    "  def scale(feature):\n",
    "    max = np.max(feature)\n",
    "    feature = (feature / max).astype(np.float)\n",
    "    return feature, max\n",
    "\n",
    "  (x_train, y_train), (x_test, y_test) = tf.keras.datasets.boston_housing.load_data(\n",
    "    path=\"boston_housing.npz\", test_split=0.2, seed=113\n",
    "  )\n",
    "  params = []\n",
    "  for _ in range(13):\n",
    "    x_train[_], max = scale(x_train[_])\n",
    "    x_test[_], _ = scale(x_test[_])\n",
    "    params.append(max)\n",
    "\n",
    "  # store the normalization (max) value for each feature\n",
    "  with tf.io.gfile.GFile(args.param_file, 'w') as f:\n",
    "    f.write(str(params))\n",
    "  return (x_train, y_train), (x_test, y_test)\n",
    "\n",
    "\n",
    "# Build the Keras model\n",
    "def build_and_compile_dnn_model():\n",
    "  model = tf.keras.Sequential([\n",
    "      tf.keras.layers.Dense(128, activation='relu', input_shape=(13,)),\n",
    "      tf.keras.layers.Dense(128, activation='relu'),\n",
    "      tf.keras.layers.Dense(1, activation='linear')\n",
    "  ])\n",
    "  model.compile(\n",
    "      loss='mse',\n",
    "      optimizer=tf.keras.optimizers.RMSprop(learning_rate=args.lr))\n",
    "  return model\n",
    "\n",
    "NUM_WORKERS = strategy.num_replicas_in_sync\n",
    "# Here the batch size scales up by number of workers since\n",
    "# `tf.data.Dataset.batch` expects the global batch size.\n",
    "BATCH_SIZE = 16\n",
    "GLOBAL_BATCH_SIZE = BATCH_SIZE * NUM_WORKERS\n",
    "\n",
    "with strategy.scope():\n",
    "  # Creation of dataset, and model building/compiling need to be within\n",
    "  # `strategy.scope()`.\n",
    "  model = build_and_compile_dnn_model()\n",
    "\n",
    "# Train the model\n",
    "(x_train, y_train), (x_test, y_test) = make_dataset()\n",
    "model.fit(x_train, y_train, epochs=args.epochs, batch_size=GLOBAL_BATCH_SIZE)\n",
    "model.save(args.model_dir)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "tarball_training_script"
   },
   "source": [
    "#### Store training script on your Cloud Storage bucket\n",
    "\n",
    "Next, you package the training folder into a compressed tar ball, and then store it in your Cloud Storage bucket."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "Jz6O2JjnABMV",
    "outputId": "2036478a-622b-4b7c-da7a-b24009e3ff79"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "custom/\n",
      "custom/setup.cfg\n",
      "custom/README.md\n",
      "custom/trainer/\n",
      "custom/trainer/__init__.py\n",
      "custom/trainer/task.py\n",
      "custom/setup.py\n",
      "custom/PKG-INFO\n",
      "Copying file://custom.tar.gz [Content-Type=application/x-tar]...\n",
      "/ [1 files][  2.0 KiB/  2.0 KiB]                                                \n",
      "Operation completed over 1 objects/2.0 KiB.                                      \n"
     ]
    }
   ],
   "source": [
    "! rm -f custom.tar custom.tar.gz\n",
    "! tar cvf custom.tar custom\n",
    "! gzip custom.tar\n",
    "! gsutil cp custom.tar.gz $BUCKET_NAME/trainer_boston.tar.gz"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "create_custom_training_job:mbsdk,no_model"
   },
   "source": [
    "### Create and run custom training job\n",
    "\n",
    "\n",
    "To train a custom model, you perform two steps: 1) create a custom training job, and 2) run the job.\n",
    "\n",
    "#### Create custom training job\n",
    "\n",
    "A custom training job is created with the `CustomTrainingJob` class, with the following parameters:\n",
    "\n",
    "- `display_name`: The human-readable name for the custom training job.\n",
    "- `container_uri`: The training container image.\n",
    "- `requirements`: Package requirements for the training container image (e.g., pandas).\n",
    "- `script_path`: The relative path to the training script."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "ybldYqDFABMW",
    "outputId": "ec39a6b7-86f1-4260-831a-125836452a97"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<google.cloud.aiplatform.training_jobs.CustomTrainingJob object at 0x7fd903dece90>\n"
     ]
    }
   ],
   "source": [
    "# Define your custom training job\n",
    "job = # TODO 1: Your code goes here(\n",
    "    display_name=\"boston_\" + TIMESTAMP,\n",
    "    script_path=\"custom/trainer/task.py\",\n",
    "    container_uri=TRAIN_IMAGE,\n",
    "    requirements=[\"gcsfs==0.7.1\", \"tensorflow-datasets==4.4\"],\n",
    ")\n",
    "\n",
    "print(job)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "prepare_custom_cmdargs"
   },
   "source": [
    "### Prepare your command-line arguments\n",
    "\n",
    "Now define the command-line arguments for your custom training container:\n",
    "\n",
    "- `args`: The command-line arguments to pass to the executable that is set as the entry point into the container.\n",
    "  - `--model-dir` : For demonstrations, this command-line argument is used to specify where to store the model artifacts.\n",
    "      - direct: You pass the Cloud Storage location as a command line argument to your training script (set variable `DIRECT = True`), or\n",
    "      - indirect: The service passes the Cloud Storage location as the environment variable `AIP_MODEL_DIR` to your training script (set variable `DIRECT = False`). In this case, you tell the service the model artifact location in the job specification.\n",
    "  - `\"--epochs=\" + EPOCHS`: The number of epochs for training.\n",
    "  - `\"--steps=\" + STEPS`: The number of steps per epoch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "id": "yjJKlsN0ABMW"
   },
   "outputs": [],
   "source": [
    "MODEL_DIR = \"{}/{}\".format(BUCKET_NAME, TIMESTAMP)\n",
    "\n",
    "EPOCHS = 20\n",
    "STEPS = 100\n",
    "\n",
    "DIRECT = True\n",
    "if DIRECT:\n",
    "    CMDARGS = [\n",
    "        \"--model-dir=\" + MODEL_DIR,\n",
    "        \"--epochs=\" + str(EPOCHS),\n",
    "        \"--steps=\" + str(STEPS),\n",
    "    ]\n",
    "else:\n",
    "    CMDARGS = [\n",
    "        \"--epochs=\" + str(EPOCHS),\n",
    "        \"--steps=\" + str(STEPS),\n",
    "    ]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "run_custom_job:mbsdk,no_model"
   },
   "source": [
    "#### Run the custom training job\n",
    "\n",
    "Next, you run the custom job to start the training job by invoking the method `run`, with the following parameters:\n",
    "\n",
    "- `args`: The command-line arguments to pass to the training script.\n",
    "- `replica_count`: The number of compute instances for training (replica_count = 1 is single node training).\n",
    "- `machine_type`: The machine type for the compute instances.\n",
    "- `accelerator_type`: The hardware accelerator type.\n",
    "- `accelerator_count`: The number of accelerators to attach to a worker replica.\n",
    "- `base_output_dir`: The Cloud Storage location to write the model artifacts to.\n",
    "- `sync`: Whether to block until completion of the job."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "D7Bubp6EABMX",
    "outputId": "4cf2216a-c65f-498e-ba51-82a95bee52ba"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:google.cloud.aiplatform.utils.source_utils:Training script copied to:\n",
      "gs://qwiklabs-gcp-00-e8a3bcf2cce6/aiplatform-2022-03-01-07:43:08.797-aiplatform_custom_trainer_script-0.1.tar.gz.\n",
      "INFO:google.cloud.aiplatform.training_jobs:Training Output directory:\n",
      "gs://qwiklabs-gcp-00-e8a3bcf2cce6/20220301073613 \n",
      "INFO:google.cloud.aiplatform.training_jobs:View Training:\n",
      "https://console.cloud.google.com/ai/platform/locations/us-central1/training/5928803091968163840?project=11810267454\n",
      "INFO:google.cloud.aiplatform.training_jobs:CustomTrainingJob projects/11810267454/locations/us-central1/trainingPipelines/5928803091968163840 current state:\n",
      "PipelineState.PIPELINE_STATE_PENDING\n",
      "INFO:google.cloud.aiplatform.training_jobs:CustomTrainingJob projects/11810267454/locations/us-central1/trainingPipelines/5928803091968163840 current state:\n",
      "PipelineState.PIPELINE_STATE_RUNNING\n",
      "INFO:google.cloud.aiplatform.training_jobs:View backing custom job:\n",
      "https://console.cloud.google.com/ai/platform/locations/us-central1/training/2525295634513133568?project=11810267454\n",
      "INFO:google.cloud.aiplatform.training_jobs:CustomTrainingJob projects/11810267454/locations/us-central1/trainingPipelines/5928803091968163840 current state:\n",
      "PipelineState.PIPELINE_STATE_RUNNING\n",
      "INFO:google.cloud.aiplatform.training_jobs:CustomTrainingJob projects/11810267454/locations/us-central1/trainingPipelines/5928803091968163840 current state:\n",
      "PipelineState.PIPELINE_STATE_RUNNING\n",
      "INFO:google.cloud.aiplatform.training_jobs:CustomTrainingJob projects/11810267454/locations/us-central1/trainingPipelines/5928803091968163840 current state:\n",
      "PipelineState.PIPELINE_STATE_RUNNING\n",
      "INFO:google.cloud.aiplatform.training_jobs:CustomTrainingJob run completed. Resource name: projects/11810267454/locations/us-central1/trainingPipelines/5928803091968163840\n",
      "WARNING:google.cloud.aiplatform.training_jobs:Training did not produce a Managed Model returning None. Training Pipeline projects/11810267454/locations/us-central1/trainingPipelines/5928803091968163840 is not configured to upload a Model. Create the Training Pipeline with model_serving_container_image_uri and model_display_name passed in. Ensure that your training script saves to model to os.environ['AIP_MODEL_DIR'].\n"
     ]
    }
   ],
   "source": [
    "if TRAIN_GPU:\n",
    "    job.run(\n",
    "        args=CMDARGS,\n",
    "        replica_count=1,\n",
    "        machine_type=TRAIN_COMPUTE,\n",
    "        accelerator_type=TRAIN_GPU.name,\n",
    "        accelerator_count=TRAIN_NGPU,\n",
    "        base_output_dir=MODEL_DIR,\n",
    "        sync=True,\n",
    "    )\n",
    "else:\n",
    "    job.run(\n",
    "        args=CMDARGS,\n",
    "        replica_count=1,\n",
    "        machine_type=TRAIN_COMPUTE,\n",
    "        base_output_dir=MODEL_DIR,\n",
    "        sync=True,\n",
    "    )\n",
    "\n",
    "model_path_to_deploy = MODEL_DIR"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "load_saved_model"
   },
   "source": [
    "## Load the saved model\n",
    "\n",
    "Your model is stored in a TensorFlow SavedModel format in a Cloud Storage bucket. Now load it from the Cloud Storage bucket, and then you can do some things, like evaluate the model, and do a prediction.\n",
    "\n",
    "To load, you use the TF.Keras `model.load_model()` method passing it the Cloud Storage path where the model is saved -- specified by `MODEL_DIR`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "4vwtdCuuABMX",
    "outputId": "4ea8dd0a-bea1-48ac-a912-bc854b974bc4"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), *NOT* tf.saved_model.save(). To confirm, there should be a file named \"keras_metadata.pb\" in the SavedModel directory.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2022-03-01 07:48:38.945060: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.\n"
     ]
    }
   ],
   "source": [
    "import tensorflow as tf\n",
    "\n",
    "local_model = tf.keras.models.load_model(MODEL_DIR)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "evaluate_custom_model:tabular"
   },
   "source": [
    "## Evaluate the model\n",
    "\n",
    "Now let's find out how good the model is.\n",
    "\n",
    "### Load evaluation data\n",
    "\n",
    "You will load the Boston Housing test (holdout) data from `tf.keras.datasets`, using the method `load_data()`. This returns the dataset as a tuple of two elements. The first element is the training data and the second is the test data. Each element is also a tuple of two elements: the feature data, and the corresponding labels (median value of owner-occupied home).\n",
    "\n",
    "You don't need the training data, and hence why it's loaded as `(_, _)`.\n",
    "\n",
    "Before you can run the data through evaluation, you need to preprocess it:\n",
    "\n",
    "`x_test`:\n",
    "1. Normalize (rescale) the data in each column by dividing each value by the maximum value of that column. This replaces every single value with a 32-bit floating-point number between 0 and 1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "evaluate_custom_model:tabular,boston",
    "outputId": "9391c9f9-e9a0-4cfb-8847-e4acfa11179c"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/boston_housing.npz\n",
      "57344/57026 [==============================] - 0s 0us/step\n",
      "65536/57026 [==================================] - 0s 0us/step\n",
      "(102, 13) float32 (102,)\n",
      "scaled [0.02715405 0.         0.02717718 0.         0.00101952 0.00966066\n",
      " 0.15015015 0.0027548  0.03603604 1.         0.03033033 0.04091592\n",
      " 0.04361862]\n",
      "unscaled [[ 18.0846   0.      18.1      0.       0.679    6.434  100.       1.8347\n",
      "   24.     666.      20.2     27.25    29.05  ]]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "from tensorflow.keras.datasets import boston_housing\n",
    "\n",
    "(_, _), (x_test, y_test) = boston_housing.load_data(\n",
    "    path=\"boston_housing.npz\", test_split=0.2, seed=113\n",
    ")\n",
    "\n",
    "\n",
    "def scale(feature):\n",
    "    max = np.max(feature)\n",
    "    feature = (feature / max).astype(np.float32)\n",
    "    return feature\n",
    "\n",
    "\n",
    "# Let's save one data item that has not been scaled\n",
    "x_test_notscaled = x_test[0:1].copy()\n",
    "\n",
    "for _ in range(13):\n",
    "    x_test[_] = scale(x_test[_])\n",
    "x_test = x_test.astype(np.float32)\n",
    "\n",
    "print(x_test.shape, x_test.dtype, y_test.shape)\n",
    "print(\"scaled\", x_test[0])\n",
    "print(\"unscaled\", x_test_notscaled)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "perform_evaluation_custom"
   },
   "source": [
    "### Perform the model evaluation\n",
    "\n",
    "Now evaluate how well the model in the custom job did."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "bxPWL78YABMY",
    "outputId": "02ef93df-ca77-45d9-81e3-fc825d380a69"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2022-03-01 07:49:29.838428: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4/4 [==============================] - 0s 7ms/step - loss: 98.0168\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "98.01676177978516"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# TODO 2: Your code goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "serving_function_signature:xai"
   },
   "source": [
    "## Get the serving function signature\n",
    "\n",
    "You can get the signatures of your model's input and output layers by reloading the model into memory, and querying it for the signatures corresponding to each layer.\n",
    "\n",
    "When making a prediction request, you need to route the request to the serving function instead of the model, so you need to know the input layer name of the serving function -- which you will use later when you make a prediction request.\n",
    "\n",
    "You also need to know the name of the serving function's input and output layer for constructing the explanation metadata -- which is discussed subsequently."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "jeBb_vfgABMZ",
    "outputId": "b50654ac-5a12-4d4b-a600-f5993d1c4282"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Serving function input: dense_input\n",
      "Serving function output: dense_2\n",
      "Model input name: dense_input\n",
      "Model output name: dense_2/BiasAdd:0\n"
     ]
    }
   ],
   "source": [
    "loaded = tf.saved_model.load(model_path_to_deploy)\n",
    "\n",
    "serving_input = list(\n",
    "    loaded.signatures[\"serving_default\"].structured_input_signature[1].keys()\n",
    ")[0]\n",
    "print(\"Serving function input:\", serving_input)\n",
    "serving_output = list(loaded.signatures[\"serving_default\"].structured_outputs.keys())[0]\n",
    "print(\"Serving function output:\", serving_output)\n",
    "\n",
    "input_name = local_model.input.name\n",
    "print(\"Model input name:\", input_name)\n",
    "output_name = local_model.output.name\n",
    "print(\"Model output name:\", output_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "explanation_spec"
   },
   "source": [
    "### Explanation Specification\n",
    "\n",
    "To get explanations when doing a prediction, you must enable the explanation capability and set corresponding settings when you upload your custom model to a Vertex `Model` resource. These settings are referred to as the explanation metadata, which consists of:\n",
    "\n",
    "- `parameters`: This is the specification for the explainability algorithm to use for explanations on your model. You can choose between:\n",
    "  - Shapley - *Note*, not recommended for image data -- can be very long running\n",
    "  - XRAI\n",
    "  - Integrated Gradients\n",
    "- `metadata`: This is the specification for how the algorithm is applied on your custom model.\n",
    "\n",
    "#### Explanation Parameters\n",
    "\n",
    "Let's first dive deeper into the settings for the explainability algorithm.\n",
    "\n",
    "#### Shapley\n",
    "\n",
    "Assigns credit for the outcome to each feature, and considers different permutations of the features. This method provides a sampling approximation of exact Shapley values.\n",
    "\n",
    "Use Cases:\n",
    "  - Classification and regression on tabular data.\n",
    "\n",
    "Parameters:\n",
    "\n",
    "- `path_count`: This is the number of paths over the features that will be processed by the algorithm. An exact approximation of the Shapley values requires M! paths, where M is the number of features. For the CIFAR10 dataset, this would be 784 (28*28).\n",
    "\n",
    "For any non-trivial number of features, this is too compute expensive. You can reduce the number of paths over the features to M * `path_count`.\n",
    "\n",
    "#### Integrated Gradients\n",
    "\n",
    "A gradients-based method to efficiently compute feature attributions with the same axiomatic properties as the Shapley value.\n",
    "\n",
    "Use Cases:\n",
    "  - Classification and regression on tabular data.\n",
    "  - Classification on image data.\n",
    "\n",
    "Parameters:\n",
    "\n",
    "- `step_count`: This is the number of steps to approximate the remaining sum. The more steps, the more accurate the integral approximation. The general rule of thumb is 50 steps, but as you increase so does the compute time.\n",
    "\n",
    "#### XRAI\n",
    "\n",
    "Based on the integrated gradients method, XRAI assesses overlapping regions of the image to create a saliency map, which highlights relevant regions of the image rather than pixels.\n",
    "\n",
    "Use Cases:\n",
    "\n",
    "  - Classification on image data.\n",
    "\n",
    "Parameters:\n",
    "\n",
    "- `step_count`: This is the number of steps to approximate the remaining sum. The more steps, the more accurate the integral approximation. The general rule of thumb is 50 steps, but as you increase so does the compute time.\n",
    "\n",
    "In the next code cell, set the variable `XAI` to which explainabilty algorithm you will use on your custom model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "id": "explanation_parameters:mbsdk"
   },
   "outputs": [],
   "source": [
    "XAI = \"ig\"  # [ shapley, ig, xrai ]\n",
    "\n",
    "if XAI == \"shapley\":\n",
    "    PARAMETERS = {\"sampled_shapley_attribution\": {\"path_count\": 10}}\n",
    "elif XAI == \"ig\":\n",
    "    PARAMETERS = {\"integrated_gradients_attribution\": {\"step_count\": 50}}\n",
    "elif XAI == \"xrai\":\n",
    "    PARAMETERS = {\"xrai_attribution\": {\"step_count\": 50}}\n",
    "\n",
    "parameters = aip.explain.ExplanationParameters(PARAMETERS)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "explanation_metadata:tabular"
   },
   "source": [
    "#### Explanation Metadata\n",
    "\n",
    "Let's first dive deeper into the explanation metadata, which consists of:\n",
    "\n",
    "- `outputs`: A scalar value in the output to attribute -- what to explain. For example, in a probability output \\[0.1, 0.2, 0.7\\] for classification, one wants an explanation for 0.7. Consider the following formulae, where the output is `y` and that's what you want to understand.\n",
    "\n",
    "    y = f(x)\n",
    "\n",
    "Consider the following formulae, where the outputs are `y` and `z`. Since you can only do attribution for one scalar value, you have to pick whether you want to explain the output `y` or `z`. Assume in this example the model is object detection and y and z are the bounding box and the object classification. You would want to pick which of the two outputs to explain.\n",
    "\n",
    "    y, z = f(x)\n",
    "\n",
    "The dictionary format for `outputs` is:\n",
    "\n",
    "    { \"outputs\": { \"[your_display_name]\":\n",
    "                   \"output_tensor_name\": [layer]\n",
    "                 }\n",
    "    }\n",
    "\n",
    "<blockquote>\n",
    " -  [your_display_name]: A human-readable name you assign to the output to explain. A common example is \"probability\".<br/>\n",
    " -  \"output_tensor_name\": The key/value field to identify the output layer to explain. <br/>\n",
    " -  [layer]: The output layer to explain. In a single task model, like a tabular regressor, it is the last (topmost) layer in the model.\n",
    "</blockquote>\n",
    "\n",
    "- `inputs`: The features for attribution -- how they contributed to the output. Consider the following formulae, where `a` and `b` are the features. You have to pick the features that explain how they contributed. Assume that this model is deployed for A/B testing, where `a` are the data_items for the prediction and `b` identifies whether the model instance is A or B. You would want to pick `a` (or some subset of) for the features, and not `b` since it does not contribute to the prediction.\n",
    "\n",
    "    y = f(a,b)\n",
    "\n",
    "The minimum dictionary format for `inputs` is:\n",
    "\n",
    "    { \"inputs\": { \"[your_display_name]\":\n",
    "                  \"input_tensor_name\": [layer]\n",
    "                 }\n",
    "    }\n",
    "\n",
    "<blockquote>\n",
    " -  [your_display_name]: A human-readable name you assign to the input to explain. A common example is \"features\".<br/>\n",
    " -  \"input_tensor_name\": The key/value field to identify the input layer for the feature attribution. <br/>\n",
    " -  [layer]: The input layer for feature attribution. In a single input tensor model, it is the first (bottom-most) layer in the model.\n",
    "</blockquote>\n",
    "\n",
    "Since the inputs to the model are tabular, you can specify the following two additional fields as reporting/visualization aids:\n",
    "\n",
    "<blockquote>\n",
    " - \"modality\": \"image\": Indicates the field values are image data.\n",
    "</blockquote>\n",
    "\n",
    "Since the inputs to the model are tabular, you can specify the following two additional fields as reporting/visualization aids:\n",
    "\n",
    "<blockquote>\n",
    " - \"encoding\": \"BAG_OF_FEATURES\" : Indicates that the inputs are set of tabular features.<br/>\n",
    " - \"index_feature_mapping\": [ feature-names ] : A list of human-readable names for each feature. For this example, you use the feature names specified in the dataset.<br/>\n",
    " - \"modality\": \"numeric\": Indicates the field values are numeric.\n",
    "</blockquote>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "id": "explanation_metadata:mbsdk,tabular"
   },
   "outputs": [],
   "source": [
    "INPUT_METADATA = {\n",
    "    \"input_tensor_name\": serving_input,\n",
    "    \"encoding\": \"BAG_OF_FEATURES\",\n",
    "    \"modality\": \"numeric\",\n",
    "    \"index_feature_mapping\": [\n",
    "        \"crim\",\n",
    "        \"zn\",\n",
    "        \"indus\",\n",
    "        \"chas\",\n",
    "        \"nox\",\n",
    "        \"rm\",\n",
    "        \"age\",\n",
    "        \"dis\",\n",
    "        \"rad\",\n",
    "        \"tax\",\n",
    "        \"ptratio\",\n",
    "        \"b\",\n",
    "        \"lstat\",\n",
    "    ],\n",
    "}\n",
    "\n",
    "OUTPUT_METADATA = {\"output_tensor_name\": serving_output}\n",
    "\n",
    "input_metadata = aip.explain.ExplanationMetadata.InputMetadata(INPUT_METADATA)\n",
    "output_metadata = aip.explain.ExplanationMetadata.OutputMetadata(OUTPUT_METADATA)\n",
    "\n",
    "metadata = aip.explain.ExplanationMetadata(\n",
    "    inputs={\"features\": input_metadata}, outputs={\"medv\": output_metadata}\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "upload_model:mbsdk,xai"
   },
   "source": [
    "## Upload the model\n",
    "\n",
    "Next, upload your model to a `Model` resource using `Model.upload()` method, with the following parameters:\n",
    "\n",
    "- `display_name`: The human-readable name for the `Model` resource.\n",
    "- `artifact`: The Cloud Storage location of the trained model artifacts.\n",
    "- `serving_container_image_uri`: The serving container image.\n",
    "- `sync`: Whether to execute the upload asynchronously or synchronously.\n",
    "- `explanation_parameters`: Parameters to configure explaining for `Model`'s predictions.\n",
    "- `explanation_metadata`: Metadata describing the `Model`'s input and output for explanation.\n",
    "\n",
    "If the `upload()` method is run asynchronously, you can subsequently block until completion with the `wait()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "uI5WMF86ABMb",
    "outputId": "1dbb9524-3092-43e6-ce1d-2b69599d403e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:google.cloud.aiplatform.models:Creating Model\n",
      "INFO:google.cloud.aiplatform.models:Create Model backing LRO: projects/11810267454/locations/us-central1/models/3277182367516590080/operations/5110007296843317248\n",
      "INFO:google.cloud.aiplatform.models:Model created. Resource name: projects/11810267454/locations/us-central1/models/3277182367516590080\n",
      "INFO:google.cloud.aiplatform.models:To use this Model in another session:\n",
      "INFO:google.cloud.aiplatform.models:model = aiplatform.Model('projects/11810267454/locations/us-central1/models/3277182367516590080')\n"
     ]
    }
   ],
   "source": [
    "model = # TODO 3: Your code goes here(\n",
    "    display_name=\"boston_\" + TIMESTAMP,\n",
    "    artifact_uri=MODEL_DIR,\n",
    "    serving_container_image_uri=DEPLOY_IMAGE,\n",
    "    explanation_parameters=parameters,\n",
    "    explanation_metadata=metadata,\n",
    "    sync=False,\n",
    ")\n",
    "\n",
    "model.wait()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "deploy_model:mbsdk,all"
   },
   "source": [
    "## Deploy the model\n",
    "\n",
    "Next, deploy your model for online prediction. To deploy the model, you invoke the `deploy` method, with the following parameters:\n",
    "\n",
    "- `deployed_model_display_name`: A human-readable name for the deployed model.\n",
    "- `traffic_split`: Percent of traffic at the endpoint that goes to this model, which is specified as a dictionary of one or more key/value pairs.\n",
    "If only one model, then specify as { \"0\": 100 }, where \"0\" refers to this model being uploaded and 100 means 100% of the traffic.\n",
    "If there are existing models on the endpoint, for which the traffic will be split, then use model_id to specify as { \"0\": percent, model_id: percent, ... }, where model_id is the model id of an existing model to the deployed endpoint. The percents must add up to 100.\n",
    "- `machine_type`: The type of machine to use for training.\n",
    "- `accelerator_type`: The hardware accelerator type.\n",
    "- `accelerator_count`: The number of accelerators to attach to a worker replica.\n",
    "- `starting_replica_count`: The number of compute instances to initially provision.\n",
    "- `max_replica_count`: The maximum number of compute instances to scale to. In this notebook, only one instance is provisioned.\n",
    "\n",
    "This can take **10-15** minutes to complete."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "-r8Y2uU5ABMb",
    "outputId": "ef548e4c-9b16-4700-c645-663618bed269"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:google.cloud.aiplatform.models:Creating Endpoint\n",
      "INFO:google.cloud.aiplatform.models:Create Endpoint backing LRO: projects/11810267454/locations/us-central1/endpoints/3881302434328346624/operations/2479905114458947584\n",
      "INFO:google.cloud.aiplatform.models:Endpoint created. Resource name: projects/11810267454/locations/us-central1/endpoints/3881302434328346624\n",
      "INFO:google.cloud.aiplatform.models:To use this Endpoint in another session:\n",
      "INFO:google.cloud.aiplatform.models:endpoint = aiplatform.Endpoint('projects/11810267454/locations/us-central1/endpoints/3881302434328346624')\n",
      "INFO:google.cloud.aiplatform.models:Deploying model to Endpoint : projects/11810267454/locations/us-central1/endpoints/3881302434328346624\n",
      "INFO:google.cloud.aiplatform.models:Deploy Endpoint model backing LRO: projects/11810267454/locations/us-central1/endpoints/3881302434328346624/operations/5686468049146740736\n",
      "INFO:google.cloud.aiplatform.models:Endpoint model deployed. Resource name: projects/11810267454/locations/us-central1/endpoints/3881302434328346624\n"
     ]
    }
   ],
   "source": [
    "DEPLOYED_NAME = \"boston-\" + TIMESTAMP\n",
    "\n",
    "TRAFFIC_SPLIT = {\"0\": 100}\n",
    "\n",
    "MIN_NODES = 1\n",
    "MAX_NODES = 1\n",
    "\n",
    "if DEPLOY_GPU:\n",
    "    endpoint = # TODO 4a: Your code goes here(\n",
    "        deployed_model_display_name=DEPLOYED_NAME,\n",
    "        traffic_split=TRAFFIC_SPLIT,\n",
    "        machine_type=DEPLOY_COMPUTE,\n",
    "        accelerator_type=DEPLOY_GPU,\n",
    "        accelerator_count=DEPLOY_NGPU,\n",
    "        min_replica_count=MIN_NODES,\n",
    "        max_replica_count=MAX_NODES,\n",
    "    )\n",
    "else:\n",
    "    endpoint = # TODO 4b: Your code goes here(\n",
    "        deployed_model_display_name=DEPLOYED_NAME,\n",
    "        traffic_split=TRAFFIC_SPLIT,\n",
    "        machine_type=DEPLOY_COMPUTE,\n",
    "        accelerator_type=DEPLOY_GPU,\n",
    "        accelerator_count=0,\n",
    "        min_replica_count=MIN_NODES,\n",
    "        max_replica_count=MAX_NODES,\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "get_test_item:test"
   },
   "source": [
    "### Get test item\n",
    "\n",
    "You will use an example out of the test (holdout) portion of the dataset as a test item."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "get_test_item:test,tabular",
    "outputId": "e522747f-865d-4660-d0d0-86bf79903032"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(13,)\n"
     ]
    }
   ],
   "source": [
    "test_item = x_test[0]\n",
    "test_label = y_test[0]\n",
    "print(test_item.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "explain_request:mbsdk,custom,lrg"
   },
   "source": [
    "### Make the prediction with explanation\n",
    "\n",
    "Now that your `Model` resource is deployed to an `Endpoint` resource, one can do online explanations by sending prediction requests to the `Endpoint` resource.\n",
    "\n",
    "#### Request\n",
    "\n",
    "The format of each instance is:\n",
    "\n",
    "    [feature_list]\n",
    "\n",
    "Since the explain() method can take multiple items (instances), send your single test item as a list of one test item.\n",
    "\n",
    "#### Response\n",
    "\n",
    "The response from the explain() call is a Python dictionary with the following entries:\n",
    "\n",
    "- `ids`: The internal assigned unique identifiers for each prediction request.\n",
    "- `predictions`: The prediction per instance.\n",
    "- `deployed_model_id`: The Vertex AI identifier for the deployed `Model` resource which did the predictions.\n",
    "- `explanations`: The feature attributions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "9F_uDQh1ABMc",
    "outputId": "d01c51ce-8f34-488d-d3c8-18cb9f184f84"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Prediction(predictions=[[0.530975223]], deployed_model_id='4586801871267561472', explanations=[attributions {\n",
      "  baseline_output_value: 0.47189685702323914\n",
      "  instance_output_value: 0.5309752225875854\n",
      "  feature_attributions {\n",
      "    struct_value {\n",
      "      fields {\n",
      "        key: \"age\"\n",
      "        value {\n",
      "          list_value {\n",
      "            values {\n",
      "              number_value: 0.01654371805489063\n",
      "            }\n",
      "          }\n",
      "        }\n",
      "      }\n",
      "      fields {\n",
      "        key: \"b\"\n",
      "        value {\n",
      "          list_value {\n",
      "            values {\n",
      "              number_value: 0.000878895225469023\n",
      "            }\n",
      "          }\n",
      "        }\n",
      "      }\n",
      "      fields {\n",
      "        key: \"chas\"\n",
      "        value {\n",
      "          list_value {\n",
      "            values {\n",
      "              number_value: 0.0\n",
      "            }\n",
      "          }\n",
      "        }\n",
      "      }\n",
      "      fields {\n",
      "        key: \"crim\"\n",
      "        value {\n",
      "          list_value {\n",
      "            values {\n",
      "              number_value: -0.002278252271935344\n",
      "            }\n",
      "          }\n",
      "        }\n",
      "      }\n",
      "      fields {\n",
      "        key: \"dis\"\n",
      "        value {\n",
      "          list_value {\n",
      "            values {\n",
      "              number_value: -0.0002662164333742112\n",
      "            }\n",
      "          }\n",
      "        }\n",
      "      }\n",
      "      fields {\n",
      "        key: \"indus\"\n",
      "        value {\n",
      "          list_value {\n",
      "            values {\n",
      "              number_value: -7.692919461987913e-05\n",
      "            }\n",
      "          }\n",
      "        }\n",
      "      }\n",
      "      fields {\n",
      "        key: \"lstat\"\n",
      "        value {\n",
      "          list_value {\n",
      "            values {\n",
      "              number_value: -0.01475213468074799\n",
      "            }\n",
      "          }\n",
      "        }\n",
      "      }\n",
      "      fields {\n",
      "        key: \"nox\"\n",
      "        value {\n",
      "          list_value {\n",
      "            values {\n",
      "              number_value: -2.91242431558203e-05\n",
      "            }\n",
      "          }\n",
      "        }\n",
      "      }\n",
      "      fields {\n",
      "        key: \"ptratio\"\n",
      "        value {\n",
      "          list_value {\n",
      "            values {\n",
      "              number_value: -0.001829853863455355\n",
      "            }\n",
      "          }\n",
      "        }\n",
      "      }\n",
      "      fields {\n",
      "        key: \"rad\"\n",
      "        value {\n",
      "          list_value {\n",
      "            values {\n",
      "              number_value: 0.0004318865831010044\n",
      "            }\n",
      "          }\n",
      "        }\n",
      "      }\n",
      "      fields {\n",
      "        key: \"rm\"\n",
      "        value {\n",
      "          list_value {\n",
      "            values {\n",
      "              number_value: 0.001415812992490828\n",
      "            }\n",
      "          }\n",
      "        }\n",
      "      }\n",
      "      fields {\n",
      "        key: \"tax\"\n",
      "        value {\n",
      "          list_value {\n",
      "            values {\n",
      "              number_value: 0.05838307365775108\n",
      "            }\n",
      "          }\n",
      "        }\n",
      "      }\n",
      "      fields {\n",
      "        key: \"zn\"\n",
      "        value {\n",
      "          list_value {\n",
      "            values {\n",
      "              number_value: 0.0\n",
      "            }\n",
      "          }\n",
      "        }\n",
      "      }\n",
      "    }\n",
      "  }\n",
      "  output_index: 0\n",
      "  approximation_error: 0.011129137855535957\n",
      "  output_name: \"medv\"\n",
      "}\n",
      "])\n"
     ]
    }
   ],
   "source": [
    "instances_list = [test_item.tolist()]\n",
    "\n",
    "prediction = # TODO 5: Your code goes here\n",
    "print(prediction)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "understand_explanations"
   },
   "source": [
    "### Understanding the explanations response\n",
    "\n",
    "First, you will look at what your model predicted and compare it to the actual value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "understand_explanations:mbsdk,boston",
    "outputId": "47681d46-1cda-4524-82ef-06ca0141d780"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Predicted Value: 0.530975223\n"
     ]
    }
   ],
   "source": [
    "value = prediction[0][0][0]\n",
    "print(\"Predicted Value:\", value)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "examine_feature_attributions",
    "tags": []
   },
   "source": [
    "### Examine feature attributions\n",
    "\n",
    "Next you will look at the feature attributions for this particular example. Positive attribution values mean a particular feature pushed your model prediction up by that amount, and vice versa for negative attribution values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "E0301 08:08:51.279392179   30619 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Collecting tabulate\n",
      "  Downloading tabulate-0.8.9-py3-none-any.whl (25 kB)\n",
      "Installing collected packages: tabulate\n",
      "Successfully installed tabulate-0.8.9\n"
     ]
    }
   ],
   "source": [
    "!python3 -m pip install tabulate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "examine_feature_attributions:mbsdk,boston",
    "outputId": "cc664bc6-a331-4340-9d2d-bd4befe9dede"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Feature name      Feature value  Attribution value\n",
      "--------------  ---------------  ------------------------\n",
      "crim                 0.0271541   [-0.002278252271935344]\n",
      "zn                   0           [0.0]\n",
      "indus                0.0271772   [-7.692919461987913e-05]\n",
      "chas                 0           [0.0]\n",
      "nox                  0.00101952  [-2.91242431558203e-05]\n",
      "rm                   0.00966066  [0.001415812992490828]\n",
      "age                  0.15015     [0.01654371805489063]\n",
      "dis                  0.0027548   [-0.0002662164333742112]\n",
      "rad                  0.036036    [0.0004318865831010044]\n",
      "tax                  1           [0.05838307365775108]\n",
      "ptratio              0.0303303   [-0.001829853863455355]\n",
      "b                    0.0409159   [0.000878895225469023]\n",
      "lstat                0.0436186   [-0.01475213468074799]\n"
     ]
    }
   ],
   "source": [
    "from tabulate import tabulate\n",
    "\n",
    "feature_names = [\n",
    "    \"crim\",\n",
    "    \"zn\",\n",
    "    \"indus\",\n",
    "    \"chas\",\n",
    "    \"nox\",\n",
    "    \"rm\",\n",
    "    \"age\",\n",
    "    \"dis\",\n",
    "    \"rad\",\n",
    "    \"tax\",\n",
    "    \"ptratio\",\n",
    "    \"b\",\n",
    "    \"lstat\",\n",
    "]\n",
    "attributions = prediction.explanations[0].attributions[0].feature_attributions\n",
    "\n",
    "rows = []\n",
    "for i, val in enumerate(feature_names):\n",
    "    rows.append([val, test_item[i], attributions[val]])\n",
    "print(tabulate(rows, headers=[\"Feature name\", \"Feature value\", \"Attribution value\"]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "check_explanations_baselines"
   },
   "source": [
    "### Check your explanations and baselines\n",
    "\n",
    "To better make sense of the feature attributions you're getting, you should compare them with your model's baseline. In most cases, the sum of your attribution values + the baseline should be very close to your model's predicted value for each input. Also note that for regression models, the `baseline_score` returned from AI Explanations will be the same for each example sent to your model. For classification models, each class will have its own baseline.\n",
    "\n",
    "In this section, you'll send 10 test examples to your model for prediction to compare the feature attributions with the baseline. Then you'll run each test example's attributions through a sanity check in the `sanity_check_explanations` method.\n",
    "\n",
    "#### Get explanations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "id": "check_explanations_baselines:mbsdk,boston"
   },
   "outputs": [],
   "source": [
    "# Prepare 10 test examples to your model for prediction\n",
    "instances = []\n",
    "for i in range(10):\n",
    "    instances.append(x_test[i].tolist())\n",
    "\n",
    "response = endpoint.explain(instances)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "sanity_check_explanations"
   },
   "source": [
    "#### Sanity check\n",
    "\n",
    "In the function below you perform a sanity check on the explanations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "qrm5J7UzABMe",
    "outputId": "7f506fff-dd45-4403-ffc3-a0755c610944"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "baseline: 0.47189685702323914\n",
      "Sanity Check 1: Passed\n",
      "1  out of  1  sanity checks passed.\n",
      "baseline: 0.47189685702323914\n",
      "Sanity Check 1: Passed\n",
      "1  out of  1  sanity checks passed.\n",
      "baseline: 0.47189685702323914\n",
      "Sanity Check 1: Passed\n",
      "1  out of  1  sanity checks passed.\n",
      "baseline: 0.47189685702323914\n",
      "Sanity Check 1: Passed\n",
      "1  out of  1  sanity checks passed.\n",
      "baseline: 0.47189685702323914\n",
      "Sanity Check 1: Passed\n",
      "1  out of  1  sanity checks passed.\n",
      "baseline: 0.47189685702323914\n",
      "Sanity Check 1: Passed\n",
      "1  out of  1  sanity checks passed.\n",
      "baseline: 0.47189685702323914\n",
      "Sanity Check 1: Passed\n",
      "1  out of  1  sanity checks passed.\n",
      "baseline: 0.47189685702323914\n",
      "Sanity Check 1: Passed\n",
      "1  out of  1  sanity checks passed.\n",
      "baseline: 0.47189685702323914\n",
      "Sanity Check 1: Passed\n",
      "1  out of  1  sanity checks passed.\n",
      "baseline: 0.47189685702323914\n",
      "Sanity Check 1: Passed\n",
      "1  out of  1  sanity checks passed.\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "\n",
    "def sanity_check_explanations(\n",
    "    explanation, prediction, mean_tgt_value=None, variance_tgt_value=None\n",
    "):\n",
    "    passed_test = 0\n",
    "    total_test = 1\n",
    "    # `attributions` is a dict where keys are the feature names\n",
    "    # and values are the feature attributions for each feature\n",
    "    baseline_score = explanation.attributions[0].baseline_output_value\n",
    "    print(\"baseline:\", baseline_score)\n",
    "\n",
    "    # Sanity check 1\n",
    "    # The prediction at the input is equal to that at the baseline.\n",
    "    #  Please use a different baseline. Some suggestions are: random input, training\n",
    "    #  set mean.\n",
    "    if abs(prediction - baseline_score) <= 0.05:\n",
    "        print(\"Warning: example score and baseline score are too close.\")\n",
    "        print(\"You might not get attributions.\")\n",
    "    else:\n",
    "        passed_test += 1\n",
    "        print(\"Sanity Check 1: Passed\")\n",
    "\n",
    "    print(passed_test, \" out of \", total_test, \" sanity checks passed.\")\n",
    "\n",
    "\n",
    "i = 0\n",
    "for explanation in response.explanations:\n",
    "    try:\n",
    "        prediction = np.max(response.predictions[i][\"scores\"])\n",
    "    except TypeError:\n",
    "        prediction = np.max(response.predictions[i])\n",
    "    sanity_check_explanations(explanation, prediction)\n",
    "    i += 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "undeploy_model:mbsdk"
   },
   "source": [
    "## Undeploy the model\n",
    "\n",
    "When you are done doing predictions, you undeploy the model from the `Endpoint` resource. This deprovisions all compute resources and ends billing for the deployed model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "PILbWPnOABMf",
    "outputId": "2b775595-5863-4ca5-c8b2-33b74257d998"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:google.cloud.aiplatform.models:Undeploying Endpoint model: projects/11810267454/locations/us-central1/endpoints/3881302434328346624\n",
      "INFO:google.cloud.aiplatform.models:Undeploy Endpoint model backing LRO: projects/11810267454/locations/us-central1/endpoints/3881302434328346624/operations/2173660339797753856\n",
      "INFO:google.cloud.aiplatform.models:Endpoint model undeployed. Resource name: projects/11810267454/locations/us-central1/endpoints/3881302434328346624\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<google.cloud.aiplatform.models.Endpoint object at 0x7fd8c825d790> \n",
       "resource name: projects/11810267454/locations/us-central1/endpoints/3881302434328346624"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "endpoint.undeploy_all()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "cleanup:mbsdk"
   },
   "source": [
    "# Cleaning up\n",
    "\n",
    "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n",
    "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the notebook.\n",
    "\n",
    "Otherwise, you can delete the individual resources you created in this notebook:\n",
    "\n",
    "- Dataset\n",
    "- Pipeline\n",
    "- Model\n",
    "- Endpoint\n",
    "- AutoML Training Job\n",
    "- Batch Job\n",
    "- Custom Job\n",
    "- Hyperparameter Tuning Job\n",
    "- Cloud Storage Bucket"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "ghpfX6OcABMf",
    "outputId": "46e661e8-8892-405b-8190-1be567aa16a4"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:google.cloud.aiplatform.base:Deleting Model : projects/11810267454/locations/us-central1/models/3277182367516590080\n",
      "INFO:google.cloud.aiplatform.base:Delete Model  backing LRO: projects/11810267454/locations/us-central1/operations/7379821509038047232\n",
      "INFO:google.cloud.aiplatform.base:Model deleted. . Resource name: projects/11810267454/locations/us-central1/models/3277182367516590080\n",
      "INFO:google.cloud.aiplatform.base:Deleting Endpoint : projects/11810267454/locations/us-central1/endpoints/3881302434328346624\n",
      "INFO:google.cloud.aiplatform.base:Delete Endpoint  backing LRO: projects/11810267454/locations/us-central1/operations/1615213986003812352\n",
      "INFO:google.cloud.aiplatform.base:Endpoint deleted. . Resource name: projects/11810267454/locations/us-central1/endpoints/3881302434328346624\n",
      "INFO:google.cloud.aiplatform.base:Deleting CustomTrainingJob : projects/11810267454/locations/us-central1/trainingPipelines/5928803091968163840\n",
      "INFO:google.cloud.aiplatform.base:Delete CustomTrainingJob  backing LRO: projects/11810267454/locations/us-central1/operations/809069652704493568\n",
      "INFO:google.cloud.aiplatform.base:CustomTrainingJob deleted. . Resource name: projects/11810267454/locations/us-central1/trainingPipelines/5928803091968163840\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "E0301 08:09:31.567717661   30619 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Removing gs://qwiklabs-gcp-00-e8a3bcf2cce6/aiplatform-2022-03-01-07:43:08.797-aiplatform_custom_trainer_script-0.1.tar.gz#1646120588892907...\n",
      "Removing gs://qwiklabs-gcp-00-e8a3bcf2cce6/trainer_boston.tar.gz#1646120569533525...\n",
      "Removing gs://qwiklabs-gcp-00-e8a3bcf2cce6/20220301073613/#1646120731235392...  \n",
      "Removing gs://qwiklabs-gcp-00-e8a3bcf2cce6/20220301073613/assets/#1646120734096401...\n",
      "/ [4 objects]                                                                   \n",
      "==> NOTE: You are performing a sequence of gsutil operations that may\n",
      "run significantly faster if you instead use gsutil -m rm ... Please\n",
      "see the -m section under \"gsutil help options\" for further information\n",
      "about when gsutil -m can be advantageous.\n",
      "\n",
      "Removing gs://qwiklabs-gcp-00-e8a3bcf2cce6/20220301073613/saved_model.pb#1646120734519263...\n",
      "Removing gs://qwiklabs-gcp-00-e8a3bcf2cce6/20220301073613/variables/#1646120731485066...\n",
      "Removing gs://qwiklabs-gcp-00-e8a3bcf2cce6/20220301073613/variables/variables.data-00000-of-00001#1646120733196628...\n",
      "Removing gs://qwiklabs-gcp-00-e8a3bcf2cce6/20220301073613/variables/variables.index#1646120733448998...\n",
      "/ [8 objects]                                                                   \n",
      "Operation completed over 8 objects.                                              \n",
      "Removing gs://qwiklabs-gcp-00-e8a3bcf2cce6/...\n"
     ]
    }
   ],
   "source": [
    "delete_all = True\n",
    "\n",
    "if delete_all:\n",
    "    # Delete the dataset using the Vertex dataset object\n",
    "    try:\n",
    "        if \"dataset\" in globals():\n",
    "            dataset.delete()\n",
    "    except Exception as e:\n",
    "        print(e)\n",
    "\n",
    "    # Delete the model using the Vertex model object\n",
    "    try:\n",
    "        if \"model\" in globals():\n",
    "            model.delete()\n",
    "    except Exception as e:\n",
    "        print(e)\n",
    "\n",
    "    # Delete the endpoint using the Vertex endpoint object\n",
    "    try:\n",
    "        if \"endpoint\" in globals():\n",
    "            endpoint.delete()\n",
    "    except Exception as e:\n",
    "        print(e)\n",
    "\n",
    "    # Delete the AutoML or Pipeline trainig job\n",
    "    try:\n",
    "        if \"dag\" in globals():\n",
    "            dag.delete()\n",
    "    except Exception as e:\n",
    "        print(e)\n",
    "\n",
    "    # Delete the custom trainig job\n",
    "    try:\n",
    "        if \"job\" in globals():\n",
    "            job.delete()\n",
    "    except Exception as e:\n",
    "        print(e)\n",
    "\n",
    "    # Delete the batch prediction job using the Vertex batch prediction object\n",
    "    try:\n",
    "        if \"batch_predict_job\" in globals():\n",
    "            batch_predict_job.delete()\n",
    "    except Exception as e:\n",
    "        print(e)\n",
    "\n",
    "    # Delete the hyperparameter tuning job using the Vertex hyperparameter tuning object\n",
    "    try:\n",
    "        if \"hpt_job\" in globals():\n",
    "            hpt_job.delete()\n",
    "    except Exception as e:\n",
    "        print(e)\n",
    "\n",
    "    if \"BUCKET_NAME\" in globals():\n",
    "        ! gsutil rm -r $BUCKET_NAME"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "sdk_custom_tabular_regression_online_explain.ipynb",
   "provenance": [],
   "toc_visible": true
  },
  "environment": {
   "kernel": "python3",
   "name": "tf2-gpu.2-6.m90",
   "type": "gcloud",
   "uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-6:m90"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
